ggplot2, dplyr, tidyr
Let’s load them:
library(tidyverse)## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
dplyrtidyrgather() and spread()dplyrdplyr provides several functions for manipulating data frames, e.g.,
select(), filter(), mutate(), rename()
Remember the mtcars dataset from yesterday?
str(mtcars)## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
select() selects specific columns from a data frame:
mtcars2 <- select(mtcars, mpg, cyl, disp)
str(mtcars2)## 'data.frame': 32 obs. of 3 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
select() can also be used to drop specific columns, with -:
mtcars3 <- select(mtcars, -am, -gear, -carb)
str(mtcars3)## 'data.frame': 32 obs. of 8 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
rename() renames specified columns. Use rename(data, newname = oldname). You can rename as many columns as you want in one rename() call.
mtcars4 <- rename(mtcars, ncyl = cyl, weight = wt)
str(mtcars4)## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ ncyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp : num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ weight: num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec : num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear : num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb : num 4 4 1 1 2 1 4 2 2 4 ...
filter() selects rows matching given conditions.
mtcars5 <- filter(mtcars, cyl == 6, am == 1)
mtcars5## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## 3 19.7 6 145 175 3.62 2.770 15.50 0 1 5 6
Multiple conditions can be specified for the same variable.
mtcars6 <- filter(mtcars, hp > 100, hp < 200)
str(mtcars6)## 'data.frame': 16 obs. of 11 variables:
## $ mpg : num 21 21 21.4 18.7 18.1 19.2 17.8 16.4 17.3 15.2 ...
## $ cyl : num 6 6 6 8 6 6 6 8 8 8 ...
## $ disp: num 160 160 258 360 225 ...
## $ hp : num 110 110 110 175 105 123 123 180 180 180 ...
## $ drat: num 3.9 3.9 3.08 3.15 2.76 3.92 3.92 3.07 3.07 3.07 ...
## $ wt : num 2.62 2.88 3.21 3.44 3.46 ...
## $ qsec: num 16.5 17 19.4 17 20.2 ...
## $ vs : num 0 0 1 0 1 1 1 0 0 0 ...
## $ am : num 1 1 0 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 3 3 3 4 4 3 3 3 ...
## $ carb: num 4 4 1 2 1 4 4 3 3 3 ...
x < y - x is less than yx > y - x is greater than yx <= y - x is less than or equal to yx >= y - x is greater than or equal to yx == y - x is equal to yx != y - x is not equal to yFor matching values to a vector, use %in%:
mtcars7 <- filter(mtcars, cyl %in% c(6, 8))
str(mtcars7)## 'data.frame': 21 obs. of 11 variables:
## $ mpg : num 21 21 21.4 18.7 18.1 14.3 19.2 17.8 16.4 17.3 ...
## $ cyl : num 6 6 6 8 6 8 6 6 8 8 ...
## $ disp: num 160 160 258 360 225 ...
## $ hp : num 110 110 110 175 105 245 123 123 180 180 ...
## $ drat: num 3.9 3.9 3.08 3.15 2.76 3.21 3.92 3.92 3.07 3.07 ...
## $ wt : num 2.62 2.88 3.21 3.44 3.46 ...
## $ qsec: num 16.5 17 19.4 17 20.2 ...
## $ vs : num 0 0 1 0 1 0 1 1 0 0 ...
## $ am : num 1 1 0 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 3 3 3 3 4 4 3 3 ...
## $ carb: num 4 4 1 2 1 4 4 4 3 3 ...
mutate() adds new columns, preserving all previous ones.
mtcars8 <- mutate(mtcars, displ_l = disp / 61.0237)
str(mtcars8)## 'data.frame': 32 obs. of 12 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp : num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec : num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear : num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb : num 4 4 1 1 2 1 4 2 2 4 ...
## $ displ_l: num 2.62 2.62 1.77 4.23 5.9 ...
Note: all dplyr methods ignore row names in data frames (on purpose). If you have them and want to keep them, they have to be converted to an explicit variable. dplyr has a function for this, rownames_to_column:
has_rownames(mtcars) # check if rownames exist## [1] TRUE
mtcars9 <- rownames_to_column(mtcars, "name")
str(mtcars9)## 'data.frame': 32 obs. of 12 variables:
## $ name: chr "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
Look at the iris dataset (e.g., View(iris)).
Use filter() to make a new data frame df, for only the “setosa” species with a sepal length between 5 and 6.5 cm.
If you’ve managed this just fine, try out the other functions: rename(), select(), mutate()
Your code should look something like this:
df <- filter(iris, Species == "setosa", Sepal.Length >= 5, Sepal.Length <= 6.5)
head(df)## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 5.0 3.6 1.4 0.2 setosa
## 3 5.4 3.9 1.7 0.4 setosa
## 4 5.0 3.4 1.5 0.2 setosa
## 5 5.4 3.7 1.5 0.2 setosa
## 6 5.8 4.0 1.2 0.2 setosa
ggplot()…diamonds datasetThere is a built-in dataset in the ggplot2 packages called diamonds.
str(diamonds)## Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 10 variables:
## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
(Try also ?diamonds to find out more.)
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point()Use aes() to map properties to variables:
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = cut)) # <<----Properties not mapped to variables should not be inside aes()
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = cut), size = 2, shape = 21, alpha = 0.6) # <<----<- plotp <- ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = cut), size = 2, shape = 21, alpha = 0.6) # <<----
print(p)str(p)## List of 9
## $ data :Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 10 variables:
## ..$ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## ..$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## ..$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## ..$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## ..$ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## ..$ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
## ..$ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
## ..$ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## ..$ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## ..$ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
## $ layers :List of 1
## ..$ :Classes 'LayerInstance', 'Layer', 'ggproto' <ggproto object: Class LayerInstance, Layer>
## aes_params: list
## compute_aesthetics: function
## compute_geom_1: function
## compute_geom_2: function
## compute_position: function
## compute_statistic: function
## data: waiver
## draw_geom: function
## geom: <ggproto object: Class GeomPoint, Geom>
## aesthetics: function
## default_aes: uneval
## draw_group: function
## draw_key: function
## draw_layer: function
## draw_panel: function
## extra_params: na.rm
## handle_na: function
## non_missing_aes: size shape
## parameters: function
## required_aes: x y
## setup_data: function
## use_defaults: function
## super: <ggproto object: Class Geom>
## geom_params: list
## inherit.aes: TRUE
## layer_data: function
## mapping: uneval
## map_statistic: function
## position: <ggproto object: Class PositionIdentity, Position>
## compute_layer: function
## compute_panel: function
## required_aes:
## setup_data: function
## setup_params: function
## super: <ggproto object: Class Position>
## print: function
## show.legend: NA
## stat: <ggproto object: Class StatIdentity, Stat>
## compute_group: function
## compute_layer: function
## compute_panel: function
## default_aes: uneval
## extra_params: na.rm
## non_missing_aes:
## parameters: function
## required_aes:
## retransform: TRUE
## setup_data: function
## setup_params: function
## super: <ggproto object: Class Stat>
## stat_params: list
## subset: NULL
## super: <ggproto object: Class Layer>
## $ scales :Classes 'ScalesList', 'ggproto' <ggproto object: Class ScalesList>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: list
## super: <ggproto object: Class ScalesList>
## $ mapping :List of 2
## ..$ x: symbol carat
## ..$ y: symbol price
## $ theme : list()
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto' <ggproto object: Class CoordCartesian, Coord>
## aspect: function
## distance: function
## expand: TRUE
## is_linear: function
## labels: function
## limits: list
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## train: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord>
## $ facet :List of 1
## ..$ shrink: logi TRUE
## ..- attr(*, "class")= chr [1:2] "null" "facet"
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 3
## ..$ x : chr "carat"
## ..$ y : chr "price"
## ..$ colour: chr "cut"
## - attr(*, "class")= chr [1:2] "gg" "ggplot"
p <- p + ggtitle("diamonds")
print(p)There are a very large number of functions for modifying properties mapped to variables, such as x, y, size, shape, alpha, color, fill, etc.
They all begin with scale_…
Scaling axes can be done with, for example, scale_x_continuous() and scale_y_continuous(), which removes values outside the range…
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = cut), size = 2, shape = 21, alpha = 0.6) +
scale_x_continuous(limits = c(0, 3)) + # <<----
scale_y_continuous(limits = c(0, 10000)) # <<----another alternative is to use or coord_cartesian(), which effectively rescales the plot window - this can be useful sometimes:
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = cut), size = 2, shape = 21, alpha = 0.6) +
coord_cartesian(xlim = c(0, 3), ylim = c(0, 10000)) # <<----There are useful functions for changing colour schemes based on specially suited colour palettes.
scale_color_brewer() and scale_fill_brewer() for discrete datascale_color_distiller() and scale_fill_distiller() for continuous dataggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = cut), size = 2, shape = 21, alpha = 0.6) +
scale_color_brewer(palette = "Set1") # <<----Legend title and labels can be changed from within the scale_ function
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = cut), size = 2, shape = 21, alpha = 0.6) +
scale_color_brewer("Grade", palette = "Set1", labels = c("E", "D", "C", "B", "A")) # <<----ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = x * y), size = 2, alpha = 0.6) +
scale_color_distiller(palette = "YlGnBu", limits = c(10, 120)) # <<----… and many more possibilities
scale_x_log10()scale_size_discrete()scale_fill_continuous()scale_alpha_manual()and so on…
I recommend reading the ggplot2 documentation to learn more!
Use xlab() and ylab() to label axes:
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = cut), size = 2, shape = 21, alpha = 0.6) +
scale_x_continuous(limits = c(0, 3)) +
scale_y_continuous(limits = c(0, 10000)) +
xlab("weight / carats") + # <<----
ylab("price / USD") # <<----diamonds2 <- filter(diamonds, cut %in% c("Premium", "Ideal"),
clarity %in% c("VVS1", "VVS2", "IF"))
p2 <- ggplot(diamonds2, aes(x = carat, y = price)) +
geom_point(aes(color = color), size = 2, alpha = 0.5)
print(p2)Create panels for each value of a variable with facet_grid(rows ~ columns)
p2 + facet_grid(. ~ cut)p2 + facet_grid(clarity ~ cut)Use facet_wrap() if you have a 1-dimensional sequence of panels and want to wrap it into a fixed number of rows or columns:
p2 + facet_wrap( ~ color, ncol = 4)a note about factors:
A factor is a “category variable” - it is only allowed to have certain values (levels).
df_a contains battery testing data for cycles 5, 10, 20 of a battery.
str(df_a)## 'data.frame': 1601 obs. of 7 variables:
## $ step.n: int 1 1 1 1 1 2 2 2 2 2 ...
## $ step.t: num 2.03 4.06 6.08 8.11 10.01 ...
## $ cyc.n : int 5 5 5 5 5 5 5 5 5 5 ...
## $ I : num 0 0 0 0 0 ...
## $ E : num 2.58 2.57 2.56 2.55 2.54 ...
## $ Q.d : num NA NA NA NA NA ...
## $ Q.c : num NA NA NA NA NA NA NA NA NA NA ...
If I plot voltage E vs charge (Q.d and Q.c) directly, colouring lines according to cycle number cyc.n:
ggplot(df_a) +
geom_path(aes(x = Q.d, y = E, color = cyc.n)) +
geom_path(aes(x = Q.c, y = E, color = cyc.n))cyc.n, as a number, is assumed to be continuous data, and the colour scale is a gradient by default for this reason. I can get the behaviour I want by converting cyc.n to a factor directly in the plot:
ggplot(df_a) +
geom_path(aes(x = Q.d, y = E, color = factor(cyc.n))) +
geom_path(aes(x = Q.c, y = E, color = factor(cyc.n)))I can also modify the data:
df_a$cyc.n <- factor(df_a$cyc.n, levels = c(5, 10, 20))
str(df_a)## 'data.frame': 1601 obs. of 7 variables:
## $ step.n: int 1 1 1 1 1 2 2 2 2 2 ...
## $ step.t: num 2.03 4.06 6.08 8.11 10.01 ...
## $ cyc.n : Factor w/ 3 levels "5","10","20": 1 1 1 1 1 1 1 1 1 1 ...
## $ I : num 0 0 0 0 0 ...
## $ E : num 2.58 2.57 2.56 2.55 2.54 ...
## $ Q.d : num NA NA NA NA NA ...
## $ Q.c : num NA NA NA NA NA NA NA NA NA NA ...
geomsThere are many - read the ggplot2 documentation!
Most useful:
geom_point(), geom_path(), geom_line(), geom_bar()p2 + facet_wrap( ~ color, ncol = 4) +
theme_bw()p2 + facet_wrap( ~ color, ncol = 4) +
theme_classic()p2 + facet_wrap( ~ color, ncol = 4) +
theme_minimal()theme_Lacey <- function(base_size=15, base_family="Lato Medium") {
library(grid)
library(ggthemes)
(theme_foundation(base_size=base_size, base_family=base_family)
+ theme(plot.title = element_text(size = rel(1.2), hjust = 0.5),
text = element_text(),
panel.background = element_rect(colour=NA),
plot.background = element_rect(fill = "transparent", colour=NA),
panel.border = element_rect(colour = NA),
axis.title = element_text(size = rel(1), colour="#333333", family = "Lato"),
axis.title.y = element_text(angle=90, colour="#333333", family = "Lato"),
axis.text = element_text(size = rel(0.8)),
axis.line.x = element_line(size=0.5, colour="#333333"),
axis.ticks.length=unit(-0.15, "cm"),
axis.text.x = element_text(margin = margin(0.5, 0, 0.2, 0, "cm"), colour="#666666"),
axis.text.y = element_text(margin = margin(0, 0.5, 0, 0.2, "cm"), colour="#666666"),
panel.grid.major = element_line(colour="#eaeaea", size = 0.5),
panel.grid.minor = element_blank(),
legend.key = element_rect(colour = NA),
legend.key.size = unit(0.6, "cm"),
legend.margin = unit(0, "cm"),
strip.background=element_rect(colour="#eaeaea",fill="#eaeaea"),
strip.text = element_text(family = "Lato",
colour = "#333333", lineheight=0.7),
legend.text = element_text(family = "Lato", colour = "#333333")
))
}p2 + facet_wrap( ~ color, ncol = 4) +
theme_Lacey()dplyr -> ggplot%>%%>% is the “pipe” operator.
It comes from the magrittr package and is loaded automatically along with dplyr/tidyverse.
%>% “pipes” an object to the first argument of a function, i.e:
x %>% f(y, z)is the same as:
f(x, y, z)This creates code which is easily read left-to-right.
For example:
diamonds %>%
ggplot(aes(x = carat, y = price)) +
geom_point(aes(color = cut), size = 2, shape = 21, alpha = 0.6)filter(diamonds, cut %in% c("Premium", "Ideal")) %>%
ggplot(aes(x = carat, y = price)) +
geom_point(aes(color = cut), size = 2, shape = 21, alpha = 0.6)In cases where you don’t want the object on the left hand side to be the first argument in the function, use the dot (.) as placeholder:
y %>% f(x, ., z)is equivalent to f(x, y, z)
But there will be more of this tomorrow!
gather() and spread()mean_iris <- iris %>%
group_by(Species) %>%
summarise_all(mean)
mean_iris## # A tibble: 3 × 5
## Species Sepal.Length Sepal.Width Petal.Length Petal.Width
## <fctr> <dbl> <dbl> <dbl> <dbl>
## 1 setosa 5.006 3.428 1.462 0.246
## 2 versicolor 5.936 2.770 4.260 1.326
## 3 virginica 6.588 2.974 5.552 2.026
Suppose I want to plot each of these flower attributes on a chart vs Species?
This is the wrong way to do it:
mean_iris %>%
ggplot(aes(x = Species)) +
geom_point(aes(y = Sepal.Length), color = "red") +
geom_point(aes(y = Sepal.Width), color = "blue") +
geom_point(aes(y = Petal.Length), color = "black") +
geom_point(aes(y = Petal.Width), color = "dark green")Data should be converted so that data points are tabulated as key-value pairs. This is what the gather() function is for:
long_iris <- gather(mean_iris, key = flower_att, value = measurement, -Species)
head(long_iris)## # A tibble: 6 × 3
## Species flower_att measurement
## <fctr> <chr> <dbl>
## 1 setosa Sepal.Length 5.006
## 2 versicolor Sepal.Length 5.936
## 3 virginica Sepal.Length 6.588
## 4 setosa Sepal.Width 3.428
## 5 versicolor Sepal.Width 2.770
## 6 virginica Sepal.Width 2.974
long_iris %>%
ggplot(aes(x = Species, y = measurement, color = flower_att)) +
geom_point()spread() does the opposite to gather()
wide_iris <- spread(long_iris, key = flower_att, value = measurement)
str(wide_iris)## Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 5 variables:
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 2 3
## $ Petal.Length: num 1.46 4.26 5.55
## $ Petal.Width : num 0.246 1.326 2.026
## $ Sepal.Length: num 5.01 5.94 6.59
## $ Sepal.Width : num 3.43 2.77 2.97
ggsave()p1 <- long_iris %>%
ggplot(aes(x = Species, y = measurement, color = flower_att)) +
geom_point()
ggsave("p1.png", plot = p1, width = 6, height = 4, units = "in", dpi = 300)ggsave() works out the format to save as from the file extension. It accepts .eps/.ps, .tex (pictex), .pdf, .jpeg, .tiff, .png, .bmp, .svg and (only on Windows) .wmf.
select(), filter(), mutate(), rename()ggplot(), aes(), geom_*(), scale_*(), xlab(), ylab(), ggtitle(), theme_*(), ggsave()gather(), spread()%>%